*TITLE*
*Exploratory Data Analysis and Accident Reduction Strategies for Road Accidents in India*
*AUTHOR*
*S Pooja*
*📋 PROJECT SUMMARY:*
*This project focuses on analyzing accident data from India through Exploratory Data Analysis (EDA) techniques. The aim is to identify key patterns related to accident severity, states, time of day, road and weather conditions, alcohol involvement, and driver demographics. Based on the findings, strategic recommendations are proposed to reduce accidents, especially focusing on poor road conditions and under-construction zones.*
*Source Information:*
*The dataset includes 3,000 accident records spanning from 2018 to 2023, with detailed attributes such as accident severity, weather conditions, road type, vehicle involvement, casualties, and more sourced from Kaggle's India Road Accident Dataset Predictive Analysis dataset by Khushi Yadav.*
#Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# Load the Data
df = pd.read_csv('accident_prediction_india.csv')
df
| State Name | City Name | Year | Month | Day of Week | Time of Day | Accident Severity | Number of Vehicles Involved | Vehicle Type Involved | Number of Casualties | ... | Road Type | Road Condition | Lighting Conditions | Traffic Control Presence | Speed Limit (km/h) | Driver Age | Driver Gender | Driver License Status | Alcohol Involvement | Accident Location Details | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Jammu and Kashmir | Unknown | 2021 | May | Monday | 1:46 | Serious | 5 | Cycle | 0 | ... | National Highway | Wet | Dark | Signs | 61 | 66 | Male | NaN | Yes | Curve |
| 1 | Uttar Pradesh | Lucknow | 2018 | January | Wednesday | 21:30 | Minor | 5 | Truck | 5 | ... | Urban Road | Dry | Dusk | Signs | 92 | 60 | Male | NaN | Yes | Straight Road |
| 2 | Chhattisgarh | Unknown | 2023 | May | Wednesday | 5:37 | Minor | 5 | Pedestrian | 6 | ... | National Highway | Under Construction | Dawn | Signs | 120 | 26 | Female | NaN | No | Bridge |
| 3 | Uttar Pradesh | Lucknow | 2020 | June | Saturday | 0:31 | Minor | 3 | Bus | 10 | ... | State Highway | Dry | Dark | Signals | 76 | 34 | Female | Valid | Yes | Straight Road |
| 4 | Sikkim | Unknown | 2021 | August | Thursday | 11:21 | Minor | 5 | Cycle | 7 | ... | Urban Road | Wet | Dusk | Signs | 115 | 30 | Male | NaN | No | Intersection |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2995 | Tamil Nadu | Chennai | 2021 | January | Sunday | 1:15 | Minor | 5 | Truck | 4 | ... | National Highway | Wet | Dark | Signs | 74 | 43 | Male | Expired | Yes | Intersection |
| 2996 | Uttarakhand | Unknown | 2018 | July | Sunday | 10:12 | Fatal | 3 | Car | 3 | ... | Urban Road | Under Construction | Daylight | NaN | 86 | 23 | Female | NaN | Yes | Intersection |
| 2997 | Meghalaya | Unknown | 2021 | January | Thursday | 19:34 | Minor | 2 | Two-Wheeler | 8 | ... | National Highway | Dry | Dark | Signs | 47 | 57 | Female | Valid | Yes | Intersection |
| 2998 | Meghalaya | Unknown | 2023 | June | Sunday | 20:54 | Fatal | 1 | Cycle | 9 | ... | Urban Road | Under Construction | Daylight | Signs | 60 | 28 | Female | Expired | Yes | Bridge |
| 2999 | Arunachal Pradesh | Unknown | 2020 | September | Monday | 7:19 | Fatal | 5 | Cycle | 1 | ... | National Highway | Under Construction | Daylight | NaN | 40 | 66 | Male | NaN | Yes | Bridge |
3000 rows × 22 columns
#View First Few Rows (Head)
df.head()
| State Name | City Name | Year | Month | Day of Week | Time of Day | Accident Severity | Number of Vehicles Involved | Vehicle Type Involved | Number of Casualties | ... | Road Type | Road Condition | Lighting Conditions | Traffic Control Presence | Speed Limit (km/h) | Driver Age | Driver Gender | Driver License Status | Alcohol Involvement | Accident Location Details | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Jammu and Kashmir | Unknown | 2021 | May | Monday | 1:46 | Serious | 5 | Cycle | 0 | ... | National Highway | Wet | Dark | Signs | 61 | 66 | Male | NaN | Yes | Curve |
| 1 | Uttar Pradesh | Lucknow | 2018 | January | Wednesday | 21:30 | Minor | 5 | Truck | 5 | ... | Urban Road | Dry | Dusk | Signs | 92 | 60 | Male | NaN | Yes | Straight Road |
| 2 | Chhattisgarh | Unknown | 2023 | May | Wednesday | 5:37 | Minor | 5 | Pedestrian | 6 | ... | National Highway | Under Construction | Dawn | Signs | 120 | 26 | Female | NaN | No | Bridge |
| 3 | Uttar Pradesh | Lucknow | 2020 | June | Saturday | 0:31 | Minor | 3 | Bus | 10 | ... | State Highway | Dry | Dark | Signals | 76 | 34 | Female | Valid | Yes | Straight Road |
| 4 | Sikkim | Unknown | 2021 | August | Thursday | 11:21 | Minor | 5 | Cycle | 7 | ... | Urban Road | Wet | Dusk | Signs | 115 | 30 | Male | NaN | No | Intersection |
5 rows × 22 columns
#View Last Few Rows (Tail)
df.tail()
| State Name | City Name | Year | Month | Day of Week | Time of Day | Accident Severity | Number of Vehicles Involved | Vehicle Type Involved | Number of Casualties | ... | Road Type | Road Condition | Lighting Conditions | Traffic Control Presence | Speed Limit (km/h) | Driver Age | Driver Gender | Driver License Status | Alcohol Involvement | Accident Location Details | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2995 | Tamil Nadu | Chennai | 2021 | January | Sunday | 1:15 | Minor | 5 | Truck | 4 | ... | National Highway | Wet | Dark | Signs | 74 | 43 | Male | Expired | Yes | Intersection |
| 2996 | Uttarakhand | Unknown | 2018 | July | Sunday | 10:12 | Fatal | 3 | Car | 3 | ... | Urban Road | Under Construction | Daylight | NaN | 86 | 23 | Female | NaN | Yes | Intersection |
| 2997 | Meghalaya | Unknown | 2021 | January | Thursday | 19:34 | Minor | 2 | Two-Wheeler | 8 | ... | National Highway | Dry | Dark | Signs | 47 | 57 | Female | Valid | Yes | Intersection |
| 2998 | Meghalaya | Unknown | 2023 | June | Sunday | 20:54 | Fatal | 1 | Cycle | 9 | ... | Urban Road | Under Construction | Daylight | Signs | 60 | 28 | Female | Expired | Yes | Bridge |
| 2999 | Arunachal Pradesh | Unknown | 2020 | September | Monday | 7:19 | Fatal | 5 | Cycle | 1 | ... | National Highway | Under Construction | Daylight | NaN | 40 | 66 | Male | NaN | Yes | Bridge |
5 rows × 22 columns
#Check Data Shape
df.shape
(3000, 22)
#Check Data Types (dtypes)
df.dtypes
State Name object City Name object Year int64 Month object Day of Week object Time of Day object Accident Severity object Number of Vehicles Involved int64 Vehicle Type Involved object Number of Casualties int64 Number of Fatalities int64 Weather Conditions object Road Type object Road Condition object Lighting Conditions object Traffic Control Presence object Speed Limit (km/h) int64 Driver Age int64 Driver Gender object Driver License Status object Alcohol Involvement object Accident Location Details object dtype: object
#duplicate value
df.duplicated().sum()
0
#Check for Missing Values
df.isnull().sum()
State Name 0 City Name 0 Year 0 Month 0 Day of Week 0 Time of Day 0 Accident Severity 0 Number of Vehicles Involved 0 Vehicle Type Involved 0 Number of Casualties 0 Number of Fatalities 0 Weather Conditions 0 Road Type 0 Road Condition 0 Lighting Conditions 0 Traffic Control Presence 716 Speed Limit (km/h) 0 Driver Age 0 Driver Gender 0 Driver License Status 975 Alcohol Involvement 0 Accident Location Details 0 dtype: int64
plt.figure(figsize=(16,6))
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
<Axes: >
#replace missing values with "Unknown"
df['Traffic Control Presence'].fillna('Unknown', inplace=True)
df['Driver License Status'].fillna('Unknown', inplace=True)
plt.figure(figsize=(16,6))
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
<Axes: >
#Statistical Summary (Describe)
df.describe()
| Year | Number of Vehicles Involved | Number of Casualties | Number of Fatalities | Speed Limit (km/h) | Driver Age | |
|---|---|---|---|---|---|---|
| count | 3000.000000 | 3000.000000 | 3000.000000 | 3000.000000 | 3000.000000 | 3000.00000 |
| mean | 2020.530000 | 2.996000 | 5.066000 | 2.455333 | 74.940667 | 44.17700 |
| std | 1.683858 | 1.428285 | 3.214097 | 1.717650 | 26.765088 | 15.40286 |
| min | 2018.000000 | 1.000000 | 0.000000 | 0.000000 | 30.000000 | 18.00000 |
| 25% | 2019.000000 | 2.000000 | 2.000000 | 1.000000 | 51.000000 | 31.00000 |
| 50% | 2021.000000 | 3.000000 | 5.000000 | 2.000000 | 75.000000 | 45.00000 |
| 75% | 2022.000000 | 4.000000 | 8.000000 | 4.000000 | 99.000000 | 57.00000 |
| max | 2023.000000 | 5.000000 | 10.000000 | 5.000000 | 120.000000 | 70.00000 |
df.describe(include='object')
| State Name | City Name | Month | Day of Week | Time of Day | Accident Severity | Vehicle Type Involved | Weather Conditions | Road Type | Road Condition | Lighting Conditions | Traffic Control Presence | Driver Gender | Driver License Status | Alcohol Involvement | Accident Location Details | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 | 3000 |
| unique | 32 | 28 | 12 | 7 | 1263 | 3 | 7 | 5 | 4 | 4 | 4 | 4 | 2 | 3 | 2 | 4 |
| top | Goa | Unknown | March | Wednesday | 8:34 | Minor | Truck | Rainy | State Highway | Under Construction | Dark | Signs | Female | Valid | Yes | Intersection |
| freq | 109 | 2138 | 266 | 468 | 7 | 1034 | 449 | 631 | 771 | 778 | 763 | 812 | 1563 | 1057 | 1520 | 789 |
#Unique Value Counts for Categorical Columns
categorical_columns = df.select_dtypes(include=['object'])
unique_counts = categorical_columns.nunique()
print(unique_counts)
State Name 32 City Name 28 Month 12 Day of Week 7 Time of Day 1263 Accident Severity 3 Vehicle Type Involved 7 Weather Conditions 5 Road Type 4 Road Condition 4 Lighting Conditions 4 Traffic Control Presence 4 Driver Gender 2 Driver License Status 3 Alcohol Involvement 2 Accident Location Details 4 dtype: int64
Time of Day has a very high number of unique values (exact accident time) — maybe better to group by hour later.
📊 Visual Exploratory Data Analysis (EDA)
Accident Severity Distribution
fig = px.pie(df, names='Accident Severity', title='Accident Severity Distribution')
fig.show()
Accidents by State
df_counts = df['State Name'].value_counts()
fig = px.bar(x=df_counts.index, y=df_counts.values,
labels={'x': 'State', 'y': 'Accidents'},
title='Accidents by State')
fig.show()
Accidents by Weather Condition
import plotly.express as px
df_counts = df['Weather Conditions'].value_counts().reset_index()
df_counts.columns = ['Weather', 'Accidents']
fig = px.bar(df_counts, x='Weather', y='Accidents',
labels={'Weather': 'Weather', 'Accidents': 'Accidents'},
title='Weather Conditions in Accidents',
color='Weather')
fig.show()
Accidents by Road Condition
df_counts = df['Road Condition'].value_counts().reset_index()
df_counts.columns = ['Road Condition', 'Accidents']
fig = px.bar(
df_counts,
x='Road Condition',
y='Accidents',
labels={'Road Condition': 'Road Condition', 'Accidents': 'Accidents'},
title='Road Condition During Accidents',
color='Road Condition'
)
fig.show()
Alcohol Involvement in Accidents
df_counts = df['Alcohol Involvement'].value_counts().reset_index()
df_counts.columns = ['Alcohol Involvement', 'Count']
fig = px.bar(
df_counts,
x='Alcohol Involvement',
y='Count',
labels={'Alcohol Involvement': 'Alcohol Involvement', 'Count': 'Count'},
title='Alcohol Involvement in Accidents',
color='Alcohol Involvement'
)
fig.show()
Accidents by Hour of Day
df['Hour'] = pd.to_datetime(df['Time of Day'], format='%H:%M', errors='coerce').dt.hour
import plotly.express as px
fig = px.histogram(df, x='Hour', nbins=24, title='Accidents by Hour of Day')
fig.update_traces(marker=dict(line=dict(color='black', width=1)))
fig.show()
Driver's Age Distribution
fig = px.histogram(df, x='Driver Age', nbins=30, title="Driver's Age Distribution")
fig.update_traces(marker=dict(line=dict(color='black', width=1)))
fig.show()
Correlation Heatmap between Numerical Features
plt.figure(figsize=(7,5))
corr = df[['Number of Vehicles Involved', 'Number of Casualties', 'Number of Fatalities', 'Speed Limit (km/h)', 'Driver Age']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap between numerical Features')
plt.show()
*Summary of Key Insights*
*1.Accident Severity Distribution:*
Minor accidents are most common. Serious and fatal accidents form a significant part, needing targeted actions.
*2.High Accident States:*
Goa, Delhi,Sikkim and a few others show the highest accident counts.
*3.Weather Conditions:*
Most accidents happen in Rainy weather. Stormy and Hazy conditions also significantly contribute.
*4.Road Conditions:*
Majority of accidents occur on under-construction road, but wet,dry roads and damaged roads are risky too.
*5.Alcohol Involvement:*
A considerable number of accidents involve alcohol. Strong need for anti-drunk-driving measures.
*6.Time of Day:*
More accidents happen around 3 AM, 5 AM to 8 AM, 9 PM to 10 PM Accidents increase during early mornings and late evenings due to fatigue, poor visibility, alcohol influence, overspeeding, and reduced alertness.
*7.Driver Demographics:*
Most drivers involved are aged between 18 to 19, 44 to 45 years. Targeted awareness campaigns needed for this group.
*8.Correlations:*
Number of Fatalities is positively correlated with Number of Casualties. No strong correlation with speed limit or driver age directly.
*Recommendations for Reducing Road Accidents in India*
*1. Stricter Law Enforcement in High-Accident States:*
Focused deployment of traffic police and automated surveillance (e.g., speed cameras, red-light cameras) in states with the highest accident rates.
Increase fines and implement stricter penalties for violations such as overspeeding, rash driving, and failure to wear seatbelts/helmets.
Conduct regular road safety audits and enforce corrective measures immediately.
*2.Nighttime Lighting Improvements and Monitoring:*
Install better street lighting on highways, rural roads, and accident-prone urban intersections to improve night visibility.
Use smart lighting systems that adjust brightness based on weather and traffic conditions.
Increase night-time patrolling and set up sobriety checkpoints to catch drowsy or intoxicated drivers.
*3. Anti-Alcohol Driving Campaigns:*
Launch nationwide awareness campaigns highlighting the dangers of drinking and driving, especially targeting festive seasons and weekends.
Implement stricter blood alcohol concentration (BAC) limits and conduct frequent random breathalyzer tests.
Collaborate with bars, restaurants, and event organizers to promote designated driver programs and encourage safe alternatives like ride-sharing.
*4. Weather-Related Driving Alerts:*
Develop real-time weather advisory systems integrated into GPS apps and highway signboards to warn drivers about fog, rain, or poor road conditions.
Enforce speed limit reductions during adverse weather and provide designated safe parking zones during extreme weather events.
Educate drivers on safe driving techniques in different weather conditions through licensing programs and public service announcements.
*5.Clear Advance Warning:*
Install large, highly visible warning signs several hundred meters before the construction site.
Use flashing lights, reflective signs, and electronic boards (especially at night or in low-visibility conditions).
*6. Training Programs for Younger and Mid-Aged Drivers:*
Introduce mandatory defensive driving courses for young drivers (under 25) and mid-aged drivers (30–45), who statistically show higher accident involvement.
Regular refresher courses for commercial drivers and fleet operators.
Incentivize participation through discounts on insurance premiums or tax benefits for individuals completing certified safety training.
*Conclusion*
Through data-driven insights, the analysis identifies major factors contributing to road accidents in India. The project recommends a combination of infrastructure improvements, stricter law enforcement, driver education, and real-time hazard management to significantly reduce accident rates and save lives.